Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. #22126

Merged
merged 17 commits into from
Feb 8, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Feb 4, 2022

This PR:

  • Provides a new training_iteration function for A3C (alternative to existing execution_plan).
  • By default, uses that new iteration function (_disable_execution_plan_api=True).
  • ~3x speedup for tuned_examples/a3c/pong-a3c.yaml (16 worker, LSTM+CNN Atari problem).
  • As a consequence of this speedup, A3C learns the Pong problem again with an LSTM -> re-instates previously out-commented weekly learning test case (similar to tuned_examples/a3c/pong-a3c.yaml).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(


# Synch updated weights back to the particular worker.
with self._timers[SYNCH_WORKER_WEIGHTS_TIMER]:
weights = local_worker.get_weights(local_worker.get_policies_to_train())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

if global_vars:
local_worker.set_global_vars(global_vars)

# TODO: If we have processed more than one gradients
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so to be clear we haven't written to result in this pr, right? But we want to for logging purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still WIP. I need to add proper compilation of the results dict. The only thing that's missing is to combine those learner stats from all workers that have returned something from the async_parallel_requests call further above. This shim implementation right now only returns the last one.
Let me finish this before merging, of course.

@avnishn
Copy link
Member

avnishn commented Feb 4, 2022

Merge pending results dict.

@sven1977 sven1977 merged commit ac3e6ab into ray-project:master Feb 8, 2022
wuisawesome added a commit that referenced this pull request Feb 9, 2022
…on instead of `execution_plan`) and re-instate Pong learning test. (#22126)"

This reverts commit ac3e6ab.
wuisawesome added a commit that referenced this pull request Feb 9, 2022
…on instead of `execution_plan`) and re-instate Pong learning test." (#22250)

Reverts #22126

Breaks rllib:tests/test_io
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
…ad of `execution_plan`) and re-instate Pong learning test. (ray-project#22126)
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
…on instead of `execution_plan`) and re-instate Pong learning test." (ray-project#22250)

Reverts ray-project#22126

Breaks rllib:tests/test_io
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants